feat: automatic retry and failover for rate-limited LLM requests by raheelshahzad · Pull Request #733 · katanemo/plano

raheelshahzad · 2026-02-10T05:00:28Z

Summary

Adds a retry-on-ratelimit system to the Plano gateway that automatically retries failed LLM requests (429, 503, timeouts) across alternative providers with intelligent selection.

Structure (2 commits)

Commit 1 — Production code (~4k lines)
Core retry engine in crates/common/src/retry/:

orchestrator: retry loop with budget tracking
provider_selector: weighted selection excluding blocked providers
error_detector: classifies responses into retryable categories
backoff: exponential backoff with jitter + Retry-After support
retry_after_state: per-provider rate-limit cooldown tracking
latency_block_state: high-latency provider temporary exclusion
latency_trigger: consecutive slow-response counter
validation: config validation with cross-field checks
error_response: structured error responses when retries exhausted

Three phases: P0 (core retry + backoff), P1 (Retry-After + fallback models + timeout), P2 (proactive high-latency failover).

Commit 2 — Tests (~10.9k lines)

302 property-based unit tests (proptest, 100+ iterations each)
13 integration test scenarios (IT-1 through IT-13)
Covers all retry behaviors: 429/503, exhaustion, backoff, fallback priority, Retry-After, timeout, high-latency failover, streaming, body preservation

adilhafeez

Thanks a lot for putting this change together @raheelshahzad . Please join our discord channel too. Overall looks good!

Left some comments in the PR and have some additional suggestions/comments on overall change,

we should do exponential backoff on retries
how do we ensure that we have not reached request timeout
max_retries should be defined somewhere in config.yaml probably not in this PR but we should let developers define that var
this code change needs an update to docs
I think we should allow retry to same provider or at least let developers define if they want to retry to different provider. Consider following example,

model_providers:
  - model: openai/gpt-4o
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY
    retry_on_ratelimit: true # new feature
    retry_to_same_provider: true # this flag should only allow retry to same provider otherwise we should retry randomly to all models

  - model: openai/gpt-5
    base_url: https://dsna-oai.openai.azure.com
    access_key: $OPENAI_API_KEY

adilhafeez · 2026-02-10T07:41:40Z

crates/common/src/llm_providers.rs

+        self.providers.iter().find_map(|(key, provider)| {
+            if provider.internal != Some(true)
+                && provider.name != current_name
+                && key == &provider.name
+            {
+                Some(Arc::clone(provider))
+            } else {
+                None
+            }
+        })


should pick random model

adilhafeez · 2026-02-10T07:42:51Z

crates/brightstaff/src/handlers/llm.rs

+        if res.status() == StatusCode::TOO_MANY_REQUESTS && attempts < max_attempts {
+            let providers = llm_providers.read().await;
+            if let Some(provider) = providers.get(&current_resolved_model) {
+                if provider.retry_on_ratelimit == Some(true) {
+                    if let Some(alt_provider) = providers.get_alternative(&current_resolved_model) {
+                        info!(
+                            request_id = %request_id,
+                            current_model = %current_resolved_model,
+                            alt_model = %alt_provider.name,
+                            "429 received, retrying with alternative model"
+                        );
+                        current_resolved_model = alt_provider.name.clone();
+                        continue;
+                    }
+                }
+            }
        }


we need to add exponential backoff

adilhafeez · 2026-02-10T07:46:46Z

crates/brightstaff/src/handlers/llm.rs

+    let mut current_resolved_model = resolved_model.clone();
+    let mut current_client_request = client_request;
+    let mut attempts = 0;
+    let max_attempts = 2; // Original + 1 retry


this should be configurable

adilhafeez · 2026-02-10T07:47:15Z

crates/brightstaff/src/handlers/llm.rs

-    );
+    // Capture start time right before sending request to upstream
+    let request_start_time = std::time::Instant::now();
+    let _request_start_system_time = std::time::SystemTime::now();


adilhafeez · 2026-02-10T15:28:25Z

I looked through envoy retry semantics https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-field-config-route-v3-routeaction-retry-policy

I think we should lean toward this design for retries. We don't have to implement this completely but we should implement bare minimal but following similar semantics / config, thoughts?

raheelshahzad

Exponential backoff with configurable base and max intervals.
Configurable max_retries.
retry_to_same_provider option.
Random alternative selection when failing over to a different model.
Documentation updates in the reference configuration.
Comprehensive unit tests for all the above.

adilhafeez · 2026-02-12T04:59:36Z

Thanks a lot Raheel for continuing to make plano better. We are getting there.

This may be a slightly better way to specify retries,

  model_providers:
    - model: openai/gpt-4o
      access_key: $OPENAI_API_KEY
      default: true
      retry_policy:
        num_retries: 2
        # retry_on: [429]             # default
        # back_off:
        #   base_interval: 25ms       # default
        #   max_interval: 250ms       # default (10x base)
        # failover:
        #   strategy: same_provider   # default

    # Need more control
    - model: anthropic/claude-sonnet-4-0
      access_key: $ANTHROPIC_API_KEY
      retry_policy:
        num_retries: 3
        failover:
          strategy: any

    # Full control
    - model: openai/gpt-4o-mini
      access_key: $OPENAI_API_KEY
      retry_policy:
        num_retries: 2
        retry_on: [429, 503]
        back_off:
          base_interval: 100ms
          max_interval: 2000ms
        failover:
          providers:
            - anthropic/claude-sonnet-4-0

    # No retries (default, just omit retry_policy)
    - model: mistral/ministral-3b-latest
      access_key: $MISTRAL_API_KEY

salmanap · 2026-03-03T21:58:08Z

Thanks a lot Raheel for continuing to make plano better. We are getting there.

This may be a slightly better way to specify retries,

  model_providers:
    - model: openai/gpt-4o
      access_key: $OPENAI_API_KEY
      default: true
      retry_policy:
        num_retries: 2
        # retry_on: [429]             # default
        # back_off:
        #   base_interval: 25ms       # default
        #   max_interval: 250ms       # default (10x base)
        # failover:
        #   strategy: same_provider   # default

    # Need more control
    - model: anthropic/claude-sonnet-4-0
      access_key: $ANTHROPIC_API_KEY
      retry_policy:
        num_retries: 3
        failover:
          strategy: any

    # Full control
    - model: openai/gpt-4o-mini
      access_key: $OPENAI_API_KEY
      retry_policy:
        num_retries: 2
        retry_on: [429, 503]
        back_off:
          base_interval: 100ms
          max_interval: 2000ms
        failover:
          providers:
            - anthropic/claude-sonnet-4-0

    # No retries (default, just omit retry_policy)
    - model: mistral/ministral-3b-latest
      access_key: $MISTRAL_API_KEY

I like this developer experience, and would love to see an updated PR about it. This would help with free-tier GPU traffic shaping and a very useful feature for coding agents.

Implement a retry-on-ratelimit system for the Plano gateway that automatically retries failed LLM requests (429, 503, timeouts) across alternative providers with intelligent provider selection. Core modules (crates/common/src/retry/): - orchestrator: retry loop with budget tracking and attempt management - provider_selector: weighted selection excluding blocked providers - error_detector: classifies responses into retryable error categories - backoff: exponential backoff with jitter and Retry-After support - retry_after_state: per-provider rate-limit cooldown tracking - latency_block_state: high-latency provider temporary exclusion - latency_trigger: consecutive slow-response counter - validation: configuration validation with cross-field checks - error_response: structured error responses when retries exhausted Three phases: P0 (core retry + backoff), P1 (Retry-After + fallback models + timeout), P2 (proactive high-latency failover). Tests follow in a separate PR.

…elimit Add 302 property-based unit tests (proptest, 100+ iterations each) and 13 integration test scenarios covering all retry behaviors. Unit tests cover: - Configuration round-trip parsing, defaults, and validation - Status code range expansion and error classification - Exponential backoff formula, bounds, and scope filtering - Provider selection strategy correctness and fallback ordering - Retry-After state scope behavior and max expiration updates - Cooldown exclusion invariants and initial selection cooldown - Bounded retry (max_attempts + budget enforcement) - Request preservation across retries - Latency trigger sliding window and block state management - Timeout vs high-latency precedence - Error response detail completeness Integration tests (tests/e2e/): - IT-1 through IT-13 covering 429/503 retry, exhaustion, backoff, fallback priority, Retry-After honoring, timeout retry, high-latency failover, streaming preservation, and body preservation

adilhafeez reviewed Feb 10, 2026

View reviewed changes

adilhafeez mentioned this pull request Feb 10, 2026

feat: add support for retrying LLM requests on 429 ratelimits (#697) #735

Closed

raheelshahzad force-pushed the feat/retry-on-ratelimit branch from d1aa3ac to ca903d2 Compare February 12, 2026 04:08

raheelshahzad commented Feb 12, 2026

View reviewed changes

raheelshahzad force-pushed the feat/retry-on-ratelimit branch from ca903d2 to 1384982 Compare March 9, 2026 00:43

raheelshahzad changed the title ~~feat: add support for retrying LLM requests on 429 ratelimits (#697)~~ feat: automatic retry and failover for rate-limited LLM requests Mar 9, 2026

raheelshahzad force-pushed the feat/retry-on-ratelimit branch from 1384982 to d569d4f Compare March 9, 2026 00:45

raheelshahzad added 2 commits March 8, 2026 18:44

raheelshahzad force-pushed the feat/retry-on-ratelimit branch from d569d4f to 98bf024 Compare March 9, 2026 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: automatic retry and failover for rate-limited LLM requests#733

feat: automatic retry and failover for rate-limited LLM requests#733
raheelshahzad wants to merge 2 commits intokatanemo:mainfrom
raheelshahzad:feat/retry-on-ratelimit

raheelshahzad commented Feb 10, 2026 •

edited

Loading

Uh oh!

adilhafeez left a comment •

edited

Loading

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez Feb 10, 2026

Uh oh!

adilhafeez commented Feb 10, 2026 •

edited

Loading

Uh oh!

raheelshahzad left a comment

Uh oh!

adilhafeez commented Feb 12, 2026 •

edited

Loading

Uh oh!

salmanap commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

raheelshahzad commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Structure (2 commits)

Uh oh!

adilhafeez left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

adilhafeez commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raheelshahzad left a comment

Choose a reason for hiding this comment

Uh oh!

adilhafeez commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salmanap commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raheelshahzad commented Feb 10, 2026 •

edited

Loading

adilhafeez left a comment •

edited

Loading

adilhafeez commented Feb 10, 2026 •

edited

Loading

adilhafeez commented Feb 12, 2026 •

edited

Loading